Skip to content

tuhh-softsec/A-Manually-Curated-Dataset-of-Vulnerability-Introducing-Commits-in-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Manually Curated Dataset of Vulnerability Introducing Commits In Java

Research in identifying vulnerabilities and the commits that introduce them is ongoing. However, many current methods rely heavily on automation, which can lead to a high rate of false positives and require significant error-checking. To address this issue, we developed a tool-assisted pipeline to manually review and examine vulnerabilities and their corresponding commits. Additionally, we collected relevant metadata such as modified lines of code, and the mapping of CVE and CWE categories. This data set can be used to validate automated methods like machine learning approaches.

Table of Contents

DOI

Dataset Description

The complete dataset can be found here.

It is structured in an JSON file with the following fields:

JSON Fields

Fieldname Brief
cwe Common Weakness Enumeration ID
introducing Commit hash that introduces the vulnerability
intro_stats Number of lines added/deleted in the introducing commit
intro_lines Lines marked as vulnerable in the introducing commit
fixing_stats Number of lines added/deleted in the fixing commits
fixing_lines Lines marked as fixing the vulnerability in the fixing commit
days_between Days between the identified introducing and fixing commits

Example

{
    "cve": "CVE-2019-11274",
    "cwe": "CWE-79",
    "repository": "https://github.com/cloudfoundry/uaa",
    "fixing": [
      "a34f55fc97a81966faf21e3ae404ec24f1f31cf7"
    ],
    "introducing": "bb8ff8f4e8969b46fdacffcd27781d223c8c7244",
    "intro_stats": {
      "bb8ff8f4e8969b46fdacffcd27781d223c8c7244": {
        "add": 320,
        "del": 7
      }
    },
    "fixing_stats": {
      "a34f55fc97a81966faf21e3ae404ec24f1f31cf7": {
        "add": 68,
        "del": 17
      }
    },
    "days_between": 1836,
    "fixing_lines": {
      "server/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "168"
    },
    "introducing_lines": {
      "scim/src/main/java/org/cloudfoundry/identity/uaa/scim/endpoints/ScimGroupEndpoints.java": "190"
    }
  },

Review Pipeline Instructions

Prerequisites

Software Used Version
Python3 3.10.8
pip3 22.3.1
git 2.29.0
Webbrowser of choice Safari 16.1

Setup

In order to install all required python packages please run the following command inside the review_pipeline directory:

  • python3 -m pip install -R requirements.txt

Usage

The pipeline can be executed by the following command inside the review_pipeline directory:

  • python3 manual_analysis_pipeline.py <path_to_input_dataset>

Input Dataset

The input dataset is expected to be a JSON file with the following fields:

Fieldname Brief
cve_id CVE id of the vulnerability
repository URL to the repository
fixing_commits List of fixing commit SHA-1 hashes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages